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Abstract.  In  this  paper,  we  consider  the  problem  of  improving  the  goal- 
achievement  performance  of  an  agent  acting  in  a  partially  observable,  dynamic 
environment,  which  may  or  may  not  know  all  events  that  can  happen  in  that  en¬ 
vironment.  Such  an  agent  cannot  reliably  predict  future  events  and  observations. 
However,  given  event  models  for  some  of  the  events  that  occur,  it  can  improve 
its  predictions  of  future  states  by  conducting  an  explanation  process  that  reveals 
unobserved  events  and  facts  that  were  true  at  some  time  in  the  past.  In  this  paper, 
we  describe  the  DlSCOVERHlSTORY  algorithm  for  discovering  an  explanation 
for  a  series  of  observations  in  the  form  of  an  event  history  and  a  set  of  assump¬ 
tions  about  the  initial  state.  When  knowledge  of  one  or  more  event  models  is  not 
present,  we  claim  that  the  capability  to  learn  these  unknown  event  models  would 
improve  performance  of  an  agent  using  DlSCOVERHlSTORY,  and  provide  exper¬ 
imental  evidence  to  support  this  claim.  We  provide  a  description  of  this  problem, 
and  suggest  how  the  DlSCOVERHlSTORY  algorithm  can  be  used  in  that  learning 
process. 
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1  Introduction 

Recent  research  in  robotics  has  demonstrated  the  need  for  intelligent  agents  that  can 
recognize  dangers  to  themselves  and  respond  proactively,  before  humans  can  commu¬ 
nicate  their  needs.  For  example,  the  Mars  rovers  must  navigate  for  many  hours  at  a  time 
without  a  human  in  the  loop,  and  the  current  generation  of  autonomous  underwater  ve¬ 
hicles  can  operate  for  similar  periods  of  time.  To  robustly  recognize  emerging  dangers, 
these  types  of  agents  must  reason  about  the  state  information  that  they  can  observe,  and 
continuously  learn  new  information  about  how  the  world  works. 

Most  commonly,  agents  that  reason  about  many  possible  states  use  a  belief  state 
representation  (Russell  and  Norvig,  2003),  in  which  an  agent  plans  over  the  set  of  “be¬ 
lief  states”,  which  is  the  power  set  of  standard  states.  While  effective  in  small  domains, 
the  size  of  the  belief  state  in  use  grows  exponentially  with  the  number  of  hidden  literals. 
We  consider  instead  a  model  in  which  the  agent  begins  by  making  assumptions  about 
hidden  facts  in  the  world  (most  often  a  closed  world  assumption),  and  then  explains 
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what  has  happened  when  unexpected  state  observations  occur.  This  is  made  possible 
by  modeling  the  types  of  exogenous  events  that  can  occur,  and  determining  what  se¬ 
quences  of  such  events  are  consistent  with  the  observations.  Unlike  work  in  diagnosis 
(e.g.,  Mcllraith  1998;  Iwan  and  Lakemeyer  2003),  we  treat  these  events  as  uncontrol¬ 
lable: ;  that  is,  the  rules  of  the  world  require  them  to  happen  when  certain  conditions 
are  met.  This  treatment  is  more  realistic  for  many  environments  and,  by  narrowing  the 
set  of  possible  explanations,  it  allows  an  agent  to  correct  initial  assumptions  about  the 
world.  An  agent  can  use  these  events  in  planning  as  well  as  in  explanation,  which  allows 
explanations  to  improve  predictions  made  in  the  future. 

We  previously  created  an  algorithm,  DlSCOVERHlSTORY,  that  corrects  assump¬ 
tions  in  dynamic,  partially  observable  environments  (Molineaux  et  ah,  in  submission). 
It  assumes  that  the  space  of  possible  events  is  fully  available  to  the  agent.  However, 
this  assumption  is  too  strong  to  meet  the  needs  of  real-world  agents  that  must  operate 
on  their  own.  How  should  a  Mars  Rover  respond,  for  example,  if  it  unexpectedly  en¬ 
counters  a  Martian  parasite?  Clearly  its  designers  cannot  plan  for  all  possible  events,  so 
the  robot  must  respond  without  a  good  description  of  how  interstellar  parasites  behave. 
Ideally,  it  should  observe  the  parasite’s  behaviors  and  explain  what  events  cause  them, 
thus  learning  its  own  model  of  environment  events. 

This  paper  is  organized  as  follows.  Section  2  surveys  related  work  in  explanation 
and  diagnosis.  Section  3  reviews  a  formalism  for  explanation  of  the  past  and  DlSCOV¬ 
ERHlSTORY.  Section  4  introduces  an  environment  designed  to  examine  its  behavior, 
while  Section  5  describes  an  empirical  study  that  examines  the  benefits  of  improving 
the  environment  model  used  by  DlSCOVERHlSTORY.  Section  6  sketches  an  algorithm 
for  learning  event  models,  and  Section  7  concludes. 


2  Related  Work 

Our  work  extends  the  work  described  by  Molineaux  et  al  (in  submission),  who  intro¬ 
duced  DlSCOVERHlSTORY  but,  unlike  here,  assumed  that  the  agent  is  given  knowledge 
on  the  complete  space  of  possible  events.  Here  we  focus  on  explaining  unexpected  states 
by  learning  the  models  of  undetected  events.  That  is,  the  role  of  explanations  in  this  pa¬ 
per  is  to  help  an  agent  to  understand  its  environment,  rather  than  communicate  its  rea¬ 
soning  to  humans.  This  differs  from  other  visions  on  the  role  of  explanations  in  AI  sys¬ 
tems,  such  as  explanation-aware  computing,  which  concerns  software-user  interactions 
(e.g.,  Atzmeuller  and  Roth-Berghofer,  2010).  Likewise,  self-explaining  agents  (e.g., 
Harbers  et  al.,  2010)  and  explainable  AI  systems  (van  Lent  et  al.,  2004)  can  generate 
explanations  of  their  behavior,  again  for  human  consumption.  While  several  methods 
have  been  investigated  that  allow  agents  to  explain  their  observations  (e.g.,  Schank  and 
Leake  (1989)  describe  a  case-based  approach),  we  are  not  aware  of  similar  approaches 
that  learn  event  models  by  abducing  explanations  from  unexpected  state  changes. 

While  many  automated  planners  can  operate  in  partially  observable  dynamic  envi¬ 
ronments,  most  ignore  this  challenging  learning  problem.  Rather  than  attempt  to  dis¬ 
cover  the  causes  of  the  unexpected  state,  they  instead  dynamically  replan  from  it  (e.g., 
Myers,  1999;  Ayan  et  al.,  2007;  Yoon  et  al.,  2007).  Those  few  exceptions  that  do  learn 
tend  to  focus  on  other  learning  tasks,  such  as  selecting  which  plans  to  apply  (Corchado 


et  al.,  2008),  reducing  the  time  to  generate  optimal  plans  when  replanning  (Rachelson 
et  al.,  2010),  or  reducing  the  need  for  future  replanning  tasks  (Jimenez  et  al.,  2008). 

Reinforcement  learning  techniques  can  be  used  to  learn  policies  for  solving  plan¬ 
ning  problems  in  partially-observable  environments,  such  as  by  modeling  tasks  as  Par¬ 
tially  Observable  Markov  Decision  Processes  (POMDPs)  (Russell  and  Norvig,  2003). 
Our  work  differs  because  we  use  a  relational  representation  for  environment  models 
that  permits  greater  scalability  and  greater  generalization. 

Using  an  extension  of  the  situation  calculus,  Mcllraith  (1998)  describes  an  approach 
that  uses  goal  regression  and  theorem  proving  to  generate  diagnostic  explanations,  de¬ 
fined  as  conjectured  sequences  of  actions  that  caused  an  observed  but  unexpected  be¬ 
havior.  Iwan  and  Lakemeyer  (2003)  addressed  related  problems  using  a  similar  ap¬ 
proach,  though  they  did  not  distinguish  actions  from  events,  they  focused  on  single  ob¬ 
servations,  and  they  did  not  empirically  test  an  implementation  of  their  formalization. 
Sohrabi  et  al.  (2010)  followed  up  with  an  empirical  analysis  of  planning  techniques 
to  perform  diagnosis,  and  extended  the  implementation  to  also  find  a  consistent  set  of 
initial  assumptions.  Their  problem  is  similar  to  ours,  except  that  they  treat  events  as 
actions,  which  allows  them  to  occur  without  a  strict  causal  relationship  to  a  prior  event. 
Furthermore,  these  systems  are  not  incorporated  into  a  plan-executing  agent. 

ASAS  (Josephson  and  Bharathan  2006)  is  a  “dynamic  abducer”  that  can  continu¬ 
ously  maintain  an  explanation  of  its  environment  and  revise  its  beliefs.  However,  it  is 
not  clear  whether  this  process  of  belief  revision  is  used  to  improve  future  predictions. 

Research  on  learning  goals  (Cox  and  Ram,  1999;  Ram  and  Leake,  1995)  is  also 
related  in  that  explanation  failures  are  used  to  trigger  learning.  However,  these  goals 
are  not  related  to  the  problem  of  learning  event  models  or  choosing  actions. 


3  Planning  and  Explanation 


We  are  interested  in  algorithms  that  achieve  goals  in  partially  observable,  dynamic  do¬ 
mains  by  executing  a  sequence  of  actions.  We  model  the  task  of  finding  a  sequence  of 
actions  to  accomplish  a  goal  as  a  planning  problem,  and  employ  a  Hierarchical  Task 
Network  planner,  SHOP2,  for  this  purpose  (Nau  et  al,  2003).  However,  since  our  do¬ 
mains  are  partially  observable  and  dynamic,  the  world  is  not  entirely  predictable.  Rather 
than  enumerating  all  possible  states  resulting  from  its  actions,  which  is  highly  compu¬ 
tationally  expensive,  our  agent  makes  assumptions  about  the  environment  and  creates 
a  plan  that  will  accomplish  its  goals  if  those  assumptions  hold.  During  execution,  if 
the  agent’s  assumptions  are  incorrect,  the  agent  will  observe  facts  that  do  not  match  its 
predictions.  We  call  this  difference  between  predictions  and  observations  a  discrepancy 
or  anomaly.  Section  3.1  presents  our  definition  of  explanations  for  these  observations. 

To  recover  from  anomalies,  we  use  DlSCOVERHlSTORY,  which  corrects  the  agent’s 
assumptions  about  the  environment  by  searching  for  an  explanation  of  the  anomaly. 
This  explanation  will  include  a  revised  set  of  assumptions  about  the  world,  which  can 
then  be  used  to  improve  the  next  planning  iteration.  Having  been  informed  more  about 
the  environment,  the  planner  can  now  create  a  better  plan.  Section  3.2  summarizes  the 
DlSCOVERHlSTORY  algorithm. 


3.1  Definitions 


We  use  the  standard  definitions  from  classical  planning  for  variable  and  constant  sym¬ 
bols,  logical  predicates  and  atoms,  literals,  groundings  of  literals,  propositions,  and  ac¬ 
tions  (Nau  et  al,  2003).  We  assume  a  state  is  encoded  by  a  set  of  logical  propositions. 

Let  V  be  the  finite  set  of  all  possible  propositions  for  describing  an  environment. 
An  environment  is  partially  observable  if  an  agent  only  has  access  to  it  through  obser¬ 
vations  that  do  not  cover  the  complete  state,  where  an  observation  is  encoded  as  a  set  of 
propositions.  We  let  V0bs  be  the  set  of  all  propositions  that  the  agent  can  observe  in  the 
world.  An  observation  associates  a  truth  value  with  each  of  these  propositions.  Further, 
let  V hidden  be  the  set  of  hidden  (state)  propositions  that  an  agent  cannot  observe;  for 
example,  a  robot’s  exact  location  may  be  hidden  to  it  if  it  has  no  GPS  contact. 

An  event  template  is  defined  syntactically  the  same  as  a  classical  planning  operator: 
(name,  preconds,  effects),  where  name,  the  name  of  the  event,  is  a  literal,  and  preconds 
and  effects,  the  preconditions  and  effects  of  the  event,  are  sets  of  literals.  We  assume 
that  an  event  always  occurs  immediately  when  all  of  its  preconditions  are  met  in  the 
world. 

We  formalize  the  planning  agent’s  knowledge  about  the  changes  in  its  envi¬ 
ronment  as  an  explanation  of  the  world.  We  define  a  finite  set  of  symbols  T  = 
{to,ti,t'2,  ■  ■  ■  ,tn},  called  occurrence  points.  An  ordering  relation  between  two  oc¬ 
currence  points  is  denoted  as  f -<  tj,  where  ,  tj  £  T. 

There  are  three  types  of  occurrences.  An  observation  occurrence  is  a  pair  of  the 
form  (obs,  t)  where  obs  is  an  observation.  An  action  occurrence  is  a  pair  of  the  form 
(a,  t)  where  a  is  an  action  (performed  by  the  agent).  Finally,  an  event  occurrence  is  a 
pair  (e,  t)  where  e  is  an  event.  In  all  of  the  occurrence  forms,  t  is  an  occurrence  point. 
Given  an  occurrence  o,  we  define  occ  as  a  function  such  that  occ(o)  i— >  t;  that  is,  occ 
refers  to  the  occurrence  point  t  of  any  observation,  action,  or  event. 

An  execution  history  is  a  finite  sequence  of  observations  and  actions  obso,  a±.  obsi, 
<22,  . . .,  Ofc,  obSfc+i.  A  planning  agent’s  explanation  of  the  world  given  an  execution 
history  is  a  tuple  \  —  (C,  R)  such  that  C  is  a  finite  set  of  occurrences  that  includes  the 
execution  history  and  zero  or  more  event  occurrences  that  happened  according  to  that 
explanation.  R  is  a  partial  ordering  over  a  subset  of  C,  described  by  ordering  relations 
o cc(oj)  -<  occ (oj)  such  that  o-i,  oj  £  C.  As  a  shorthand,  we  sometimes  will  say  o,;  A  Oj, 
which  is  true  when  occ(oj)  -<  o cc(oj). 

We  use  the  definitions  knownbefore(p,  o)  and  knownafter(p,  o)  to  refer  to  the  truth 
of  a  proposition  p  before  or  after  an  occurrence  o  occurs.  Let  o  be  an  action  or  event 
occurrence.  Then,  the  relation  knownbefore(p,  o)  is  true  when  p  £  preconds(o).  Sim¬ 
ilarly,  the  relation  knownafter(p,  o)  is  true  when  p  £  effects(o).  If  o  is  an  observation 
occurrence  and  p  £  obs,  then  both  knownbefore(p,  o)  and  knownafter(p,  o)  hold. 

We  say  that  an  occurrence  o  is  relevant  to  a  proposition  p  if  the  following 
holds:  releva  nt(p,  o)  =  knownafter(p,  o)  V  knownafter(^p,  o)  V  known before(p,  o) 
Vknownbefore(->p,  o).  We  use  the  predicates  prior (o,p)  and  next(o,p)  to  refer  to  the 
prior  and  next  occurrence  relevant  to  a  proposition  p.  That  is  to  say,  prior (o,p)  = 
{o'  |  relevant(p,  o')  A  -'3o,,s.i.  releva  nt(p,  o")  AoA  o"  -<  o}.  Similarly,  next(o,p)  = 
{o'  |  relevant(p,  o')  A  -3o"s.f.relevant(p,  o")  Ao-^o'A  o'}. 


The  proximate  cause  of  an  event  occurrence  (e,  t)  is  an  occurrence  o  that  satisfies 
the  following  three  conditions  with  respect  to  some  proposition  p:  (1)  p  £  preconds(e), 
(2)  knownafter(p,  o),  and  (3)  there  is  no  other  occurrence  o'  such  that  o  -<  o'  -<  (e,  t). 
Every  event  occurrence  (e,  t ),  must  have  at  least  one  proximate  cause,  meaning  that  by 
condition  3,  every  event  occurrence  (e,  t )  must  occur  immediately  after  its  precondi¬ 
tions  are  satisfied. 

An  inconsistency  is  a  tuple  ( p ,  o,  o')  where  o  and  o'  are  two  occurrences  in  y  such 
that  knownafter(^p,  o),  knownbefore(p,  o'),  and  there  is  no  other  occurrence  o"  such 
that  o  A  o"  -<  o'  £  R  and  p  is  relevant  to  o" . 

An  explanation  y  =  {C,  R)  is  plausible  if  and  only  if  the  following  holds: 

1.  There  are  no  inconsistencies  in  y; 

2.  Every  event  occurrence  (e,  t)  £  y  has  a  proximate  cause  in  y; 

3.  Simultaneous  occurrences  cannot  contradict  one  other:  for  each  pair  o,  o'  £  C  such 
that  occ(o)  =  occ(o'),  and  for  all  p,  knownafter(p,  o)  =>  ^knownafter(->p,  o'), 
and  knownbefore(p,  o)  =>  ^knownbefore(^p,  o'). 

4.  If  preconds(e)  of  an  event  e  are  all  met  at  occurrence  point  t,  e  must  be  in  y  at  t. 

3.2  DiscoverHistory  algorithm 

The  DiscoverHistory  algorithm  (Molineaux  et  al.,  2011)  finds  a  plausible  expla¬ 
nation  that  is  consistent  with  an  execution  history.  At  first,  our  agent  makes  a  set  of 
“optimistic”  closed-world  assumptions,  assuming  that  the  obstacles  it  knows  about  are 
not  present,  and  makes  a  plan  based  on  those  assumptions.  Later,  when  an  anomaly  oc¬ 
curs,  it  can  revisit  these  assumptions  and  replan.  An  anomaly  occurs  as  a  result  of  one 
of  two  problems:  (1)  there  are  hidden  facts  and/or  event  types  present  in  the  environ¬ 
ment  for  which  the  agent  has  no  model,  or  (2)  the  agent’s  initial  optimistic  assumptions 
are  false.  The  first  problem  requires  learning,  which  we  address  in  Section  6.  The  sec¬ 
ond  problem  means  that  the  agent  can  improve  its  environment  model  by  correcting  its 
explanation,  which  will  require  changing  its  assumptions. 

The  agent  maintains  a  current  explanation  at  all  times.  As  it  executes  actions,  the 
agent  makes  predictions  about  what  events  will  occur,  according  to  its  observations 
and  assumptions,  and  adds  those  events  to  its  explanation.  However,  in  a  partially- 
observable,  dynamic  world,  events  occur  that  the  agent  is  unable  to  predict,  which  cause 
anomalies.  The  agent  recognizes  an  anomaly  because  one  or  more  inconsistencies  ap¬ 
pear  in  its  explanation.  As  described  in  Section  3.1,  an  inconsistency  occurs  when  a 
fact  is  true  after  a  relevant  occurrence,  and  false  before  the  next  relevant  occurrence. 
When  these  inconsistencies  appear,  DISCOVERHISTORY  resolves  them,  using  one  of 
the  following  three  inconsistency  resolution  methods. 

1.  Event  addition.  This  searches  for  a  situation  where  an  event  that  was  not  predicted 
caused  a  literal’s  value  to  change.  The  new  event  must  occur  before  the  later  occur¬ 
rence,  so  as  to  explain  the  value  at  that  time,  but  not  before  the  earlier  occurrence. 

2.  Event  removal:  This  searches  for  a  situation  where  one  of  the  predicted  events  did 
not  happen  as  predicted. 


3.  Assumption  change.  Assumptions  are  only  made  about  the  initial  state  in  our  ex¬ 
planations,  since  all  following  states  can  be  determined  from  the  initial  state  in  a 
deterministic  world.  Any  event  prior  to  the  inconsistency  would  provide  more  infor¬ 
mation  about  the  truth  of  the  literal  in  question.  Therefore,  this  resolution  method 
is  only  applied  when  no  relevant  prior  event  is  present  in  the  explanation. 

Each  of  these  resolution  methods  can  introduce  new  inconsistencies.  Thus,  DlSCOVER- 
HlSTORY  continues  resolving  inconsistencies  until  a  consistent  explanation  is  found,  or 
until  a  maximum  number  M  of  changes  have  been  made.  This  search  exhaustively  enu¬ 
merates  all  explanations  within  M  applications  of  resolution  methods. 

When  a  plausible  explanation  is  found,  the  assumptions  made  by  that  explanation 
are  adopted  and  can  be  used  in  prediction.  This  results  in  fewer  surprises.  Since  many 
such  surprises  are  deleterious,  goals  can  be  achieved  more  often,  as  shown  in  Section  5. 
A  more  detailed  explanation  of  DlSCOVERHlSTORY  is  in  (Molineaux  et  al,  201 1). 


4  The  Hazardous  Rover  Domain 

Inspired  by  the  Mars  rovers,1  which  operate  in  a  very  hazardous  environment,  we  con¬ 
structed  a  domain  for  testing  that  is  predictable  enough  to  make  planning  tractable,  but 
requires  significant  explanation  to  perform  well.  In  our  domain,  the  only  actions  are  to 
move  in  one  of  the  four  cardinal  directions,  and  the  only  goals  are  to  reach  a  location  on 
a  six-by-six  grid.  However,  complications  can  prevent  a  rover  from  reaching  its  goal: 

1.  Sandstorms,  which  obscure  visibility  and  can  blow  a  moving  rover  off  course 

2.  Hidden  sand  pits,  which  trap  a  rover  and  prevent  it  from  moving 

3.  Compass  malfunctions,  which  reverse  a  rover’s  navigational  system,  causing  it  to 
move  north  when  it  intended  to  move  south 

Each  of  these  obstacles  provides  a  different  challenge  for  learning.  Consider  the  prob¬ 
lem  of  a  rover  that  has  not  been  programmed  to  recognize  a  sandstorm.  When  the  windy 
season  on  Mars  starts,  the  rover’s  sensor  mechanisms  become  covered  in  sand,  and  it 
can  no  longer  recognize  its  location.  It  can  predict  where  it  should  be,  but  its  predictions 
may  deviate  more  from  reality  when  strong  sandstorm  winds  blow  it  off  course.  When 
the  storm  clears,  the  rover  observes  that  it  has  not  reached  its  desired  location.  This  is  an 
opportunity  to  learn;  the  rover  could  explain  that  the  unexpected  change  in  its  position 
occurs  only  when  the  rover  is  covered  in  sand. 

Next  consider  the  case  of  a  rover  that  finds  itself  in  a  hidden  sand  pit.  The  rover 
observes  that  it  can  not  move  any  longer  when  it  reaches  a  certain  location.  Later  on, 
another  rover  gets  stuck  nearby.  Due  to  the  same  results  occurring  to  both  rovers  at  the 
same  location,  it  could  explain  that  rovers  get  stuck  at  a  certain  location. 

Finally,  consider  the  case  of  compass  malfunctions.  In  early  afternoon  one  day,  the 
rover  attempts  to  move  north,  and  finds  that  it  ends  up  in  a  different  location  than  the 
one  it  expected,  which  could  have  been  reached  by  going  to  the  south.  It  tries  moving 
north  repeatedly,  and  repeatedly  observes  a  similar  result,  a  location  one  to  the  south 

1  http://marsrover.nasa.gov/home/ 


of  its  last  location.  By  finding  the  similarities  between  the  transitions,  the  rover  could 
explain  that  an  event  occurred  in  early  afternoon  that  affected  its  navigation. 

With  foreknowledge  of  these  types  of  events,  we  have  shown  in  prior  work  (Mo- 
lineaux  et  al,  201 1)  that  an  agent  employing  DlSCOVERHlSTORY  outperforms  one  that 
only  performs  replanning  and  execution  monitoring.  We  now  investigate  whether  an 
agent  that  can  learn  these  event  models  would  outperform  one  that  could  not. 

5  Experiment:  Potential  for  Event  Learning 

To  examine  the  potential  effects  of  learning  in  this  domain,  we  compare  the  performance 
of  several  explanation-forming  agents  with  different  levels  of  knowledge  about  the  do¬ 
main.  We  measure  performance  in  this  domain  as  the  percentage  of  goals  achieved.  As 
explained  earlier,  the  Hazardous  Rovers  Domain  includes  three  types  of  obstacles.  For 
each  obstacle,  one  event  related  to  understanding  that  obstacle  is  considered  as  a  target 
for  learning.  With  regard  to  sandstorms,  the  event  that  causes  a  rover  to  blow  off  course 
in  a  sandstorm  is  sandstorm-blows',  for  sand  pits,  the  event  that  causes  the  area  around 
a  hidden  sand-pit  to  appear  abnormally  rough  is  pit-causes-rough;  the  event  that  causes 
a  compass  to  malfunction  is  called  compass-malfunction-occurs.  Eight  agents  using  the 
DlSCOVERHlSTORY  algorithm  are  compared.  The  first  agent  has  knowledge  of  none 
of  these  events;  three  agents  know  about  one;  three  agents  know  about  two;  and  the 
last  agent  has  knowledge  of  all  three.  If  learning  is  effective  in  this  domain,  we  expect 
improvement  from  an  agent  with  less  knowledge  to  an  agent  with  more. 

To  compare  the  agents,  we  randomly  generated  25  scenarios  in  the  hazardous  rover 
domain.  In  each  scenario,  three  rovers  are  each  assigned  a  random  initial  location  and 
a  random  goal  that  requires  at  least  four  moves  to  reach  from  the  starting  location. 
Each  location  on  the  six-by-six  grid  contains  a  sand  pit  with  probability  0.3,  and  with 
probability  0.66  the  sand  pit  will  be  hidden.  At  each  possible  time  point,  a  storm  may 
begin  with  probability  0.2,  unless  a  storm  is  ongoing  at  that  time.  Finally,  each  rover 
has  a  compass  malfunction  within  the  first  100  time  steps.  The  time  step  is  chosen  by 
taking  the  floor  of  the  real  number  found  by  squaring  a  number  randomly  distributed 
between  0  and  10.  This  biases  the  event  to  occur  early  in  a  scenario  execution. 

Table  1  shows  the  difference  in  the  average  percentage  of  goals  achieved  between 
each  pair  of  agents  which  differ  only  by  the  knowledge  of  a  single  event.  When  all 
other  events  are  known,  the  difference  averages  13.8%,  meaning  that  the  additional 
knowledge  of  a  single  event  was  sufficient  to  achieve  success  instead  of  failure  in  13.8% 
of  cases.  The  performance  advantage  caused  by  all  3  events  together  is  32.0%,  with 
statistical  significance  at  p  <  .0001.  9  out  of  12  comparisons  showed  that  knowledge 
of  an  additional  event  improved  goal  achievement  at  a  statistically  significant  level, 
indicating  that  learning  event  models  is  a  promising  avenue. 

6  Discovering  Events 

Now  that  we’ve  shown  the  utility  of  event  learning  in  improving  the  performance  of 
an  explanation  discovery  agent  in  a  partially  observable,  dynamic  planning  domain, 
we  consider  the  problem  of  inducing  those  events.  This  can  be  neatly  divided  into  two 


Table  1:  Increase  in  goals  achieved  after  adding  event  models.  (N/S):  non-significant. 


Known  Events 

Perf. 

Known  events 

Perf. 

Known  events 

Perf. 

compass-malfunctions 

pit-causes-rough 

18.6%,  pc.Ol 

storm-blows 

pit-causes-rough 

12.0%,  pc.Ol 

compass-malfunctions 

storm-blows 

10.7%,  pc.05 

compass-malfunctions 

14.7%,  pc.Ol 

storm-blows 

5.33%,  pc.05 

compass-malfunctions 

6.66%,  (N/S) 

pit-causes-rough 

16.0%,  pc.Ol 

pit-causes-rough 

9.32%,  pc.05 

storm-blows 

4.00%,  (N/S) 

none 

16.0%,  pc.Ol 

none 

6.66%,  pc.05 

none 

4.00%,  (N/S) 

(a)  Effect  of  adding  storm-blows  (b)  Effect  of  adding  compass-  (c)  Effect  of  adding  pit-causes-rough 
malfunctions 


separate  problems:  discovering  when  something  unexplained  happens,  and  finding  an 
effect  that  explains  it.  In  a  first-order  logic  domain,  the  first  problem  results  in  a  set  of 
literal  preconditions,  and  the  second  in  a  set  of  literal  effects.  If  found,  such  an  event 
could  be  used  to  explain  the  anomalies  encountered  by  the  agent.  However,  if  all  of 
the  preconditions  of  an  event  are  hidden,  and  no  evidence  is  available  to  determine 
their  values,  then  that  event  cannot  be  discovered.  Therefore,  we  assume  that  at  least 
one  precondition  is  observable.  Similarly,  since  hidden  effects  are  difficult  to  verify, 
we  assume  that  all  events  cause  at  least  one  observable  effect.  We  discuss  these  two 
problems,  discovery  of  preconditions  and  effects,  in  Sections  6.1  and  6.2  respectively. 

When  events  are  discovered,  we  need  metrics  to  use  in  evaluating  and  adopting  those 
events.  We  propose  to  use  predictive  power  and  explanatory  power.  If  an  event  is  usu¬ 
ally  predicted  before  it  occurs,  and  rarely  predicted  before  it  does  not  occur,  it  has  high 
predictive  power.  Similarly,  if  knowledge  of  an  event  enables  explanation  after  it  occurs, 
and  does  not  lead  to  false  explanations  when  it  has  not  occurred,  it  has  high  explanatory 
power.  While  both  are  useful  as  metrics  of  success,  they  cannot  be  easily  evaluated  in 
situ  by  an  agent.  However,  an  agent  can  approximate  them  by  maintaining  a  history 
of  the  conditions  and  results  of  prior  predictions  and  explanations.  By  re-attempting 
each  prior  prediction,  the  agent  can  evaluate  how  many  predictions  would  have  been 
made  incorrect  by  the  inclusion  of  the  event  in  the  agent’s  knowledge  and  how  many 
would  have  been  made  correct.  Similarly,  explanations  can  be  re-attempted  with  the 
new  knowledge;  failed  explanations  that  become  successful  are  successes,  and  success¬ 
ful  explanations  for  which  no  explanation  is  now  possible  are  failures.  By  adopting  a 
confidence  metric  over  predictive  and  explanatory  power,  an  agent  can  decide  when  to 
include  a  discovered  event  in  its  knowledge  base  for  use  in  future  explanation  attempts. 

6.1  Discovering  Event  Preconditions 

In  the  simplest  case,  all  anomalies  can  be  attributed  to  a  single  event.  This  may  be 
a  useful  assumption  in  many  cases  (e.g.,  when  conditions  gradually  change  an  agent 
might  be  exposed  to  new  types  of  events  incrementally).  When  the  windy  season  starts 
on  Mars,  for  example,  the  rover  may  already  be  prepared  with  knowledge  of  equipment 
malfunctions  and  hidden  sand  pits.  Its  problem  is  to  determine  what  is  common  among 
all  the  anomalies  that  are  encountered.  The  classification  of  anomaly-causing  states 
is  a  relational  multi-instance  learning  task  (Dietterich  et  al,  1997)  over  the  complete 


history  of  observations  made  by  the  agent;  any  sequence  of  occurrences  that  contains 
an  anomaly  is  represented  as  a  “bag”  of  positive  examples,  and  all  other  sequences  are 
included  in  a  bag  of  negative  examples.  Each  example  consists  of  a  set  of  literals  known 
to  be  true  at  one  time. 

As  an  example,  suppose  the  rover  moves  to  an  area  on  Mars  that  is  covered  with 
hidden  sand  pits,  and  surrounded  by  rough  terrain.  The  rover  frequently  gets  caught  in 
these  pits,  and  each  time  the  experience  is  anomalous,  because  it  doesn’t  know  how  to 
predict.  The  rover  builds  up  a  memory  of  anomalies  and  the  states  that  preceded  them. 
The  rover  solves  the  learning  task  described  above  to  discover  that  each  anomaly  was 
preceded  by  states  in  which  rough  ground  was  present  and  a  sand  pit  was  not  nearby. 
These  two  facts  become  the  preconditions  for  a  new  generated  event,  Gl. 

6.2  Discovering  Event  Effects 

After  discovering  the  conditions  that  caused  the  recognized  anomalies,  we  can  assume 
an  event  model  exists  that  is  triggered  by  those  conditions.  We  can  now  predict  when 
this  event  will  occur,  but  its  effects  are  still  unknown.  To  find  the  effects  of  this  event, 
we  can  add  it  to  the  knowledge  base  as  an  event  model  with  no  effects,  and  apply  a 
modified  version  of  DlSCOVERHlSTORY  with  a  new  inconsistency  resolution  method. 
This  method  will  resolve  inconsistencies  whose  durations  overlap  with  the  new  event; 
it  will  resolve  them  by  assuming  that  the  event  has  an  effect  that  causes  the  inconsis¬ 
tency.  All  such  assumptions  made  in  a  plausible  explanation  constitute  one  possible  set 
of  effects  for  the  event  model.  This  modified  DlSCOVERHlSTORY  could  be  used  to  find 
a  set  of  plausible  explanations  for  each  recognized  anomaly.  Therefore,  using  DlSCOV¬ 
ERHlSTORY  to  explain  all  recognized  anomalies  will  result  in  a  set  of  possible  event 
models,  which  can  be  evaluated  in  terms  of  their  predictive  and  explanatory  power. 

Moving  on  with  the  hidden  sand  pits  example  (Section  6.1),  the  rover  applies  DlS¬ 
COVERHlSTORY  to  resolve  inconsistencies.  Suppose  that  the  rover  observes  a  literal 
(wheels-stuck)  which  prevents  the  move  action  from  working.  The  rover’s  expla¬ 
nation  now  includes  a  Gl  event  that  occurred  after  moving  to  a  rocky  area.  Now,  the 
inconsistency  can  be  explained:  (wheels-stuck)  is  an  effect  of  the  Gl  event. 

7  Conclusion 

Our  empirical  studies  showed  that  learning  event  models  can  improve  the  performance 
of  an  agent  using  DlSCOVERHlSTORY  in  a  partially-observable  dynamic  environment. 
We  proposed  a  framework  for  acquiring  these  models  from  experience.  Our  next  steps 
will  be  to  implement  this  framework  and  evaluate  its  ability  to  (1)  discover  unknown 
event  models  and  (2)  improve  its  goal-achievement  performance  over  time. 
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